4 Multiple Linear Regression (MLR)
4.1 Introduction
Multiple linear regression (MLR) extends simple linear regression by allowing \(Y\) to depend on more than one predictor variable. This enables richer models and allows us to estimate the partial contribution of each predictor while accounting for others.
When to Extend SLR
SLR is limited to one predictor. MLR becomes appropriate when:
- Multiple factors plausibly influence the response.
- Excluding predictors may bias results.
- You wish to measure the unique contribution of each predictor while controlling for others.
A graphical motivation
Lets consider a simple example using the trees data
This dataset records the height, girth (diameter) and volume
We might look at each bivariate (i.e. two variables) relationship separately:
However we might also look at the multivariate relationship
in the case of 3 variables we can also extend our visualisation to 3 dimensions:
4.2 The multiple regression linear model
The multiple linear model extends the simple linear model from Section 2.4 straightforwardly by adding the extra variables to the deterministic linear predictor, each with its own parameter.
\[ Y = \beta_0 + \beta_1 x_1 + \cdots + \beta_k x_k + \varepsilon, \quad \varepsilon \sim N(0,\sigma^2) \]
Now, instead of a single predictor \(X\), we may have several (\(k\)) predictors \(x_1, x_2, \ldots, x_k\), each corresponding to a column in our dataset. We use an index to distinguish these predictors, and each predictor \(x_i\) has a corresponding parameter \(\beta_i\) governing its effect on the response. Note that we have updated the intercept parameter \(\alpha\) from the simple linear regression section to \(\beta_0\), because it is simply another parameter in the model!
The random error term, \(\varepsilon\) stays the same, so the assumptions of MLR are very similar to those of SLR:
Assumption: Assumptions of the MLR model
- A linear predictor, e.g. \(E[Y]=\beta_0 + \beta_1 x_1 + \cdots + \beta_k x_k\).
- Independent and identically distributed random error terms that
- Have a mean \(\mu=0\), and a constant variance \(Var(\varepsilon)=\sigma^2\)
- Follow a normal distribution: \(N(0,\sigma^2)\)
Note: Multivariate Linearity
Although this is still a linear model, the term “linear” no longer refers to a literal straight line in two dimensions as it did in simple linear regression. Instead, “linear” means that the model is linear in its parameters: each \(\beta_i\) multiplies a predictor and enters additively. This linear model may live in a high-dimensional predictor space and cannot be visualised as a single line. For example, for the three dimensional data presented above (one outcome: \(Y\) + two predictors: \(X_1\), \(X_2\)), a linear model will instead look like a three dimensional plane
Partial Regression Coefficients
Even though our model is no longer just a straight line, the slope interpretation from simple linear regression is still relevant in multiple regression. Now however, \(\beta_i\) describes the expected change in the mean response for a one-unit increase in \(x_i\), holding all other predictors constant.
Example 1
Interpreting a partial coefficient If the fitted model for life expectancy includes Internet usage and BirthRate,
Internet: 0.112
BirthRate: -0.594
Then:
- A 1% increase in internet usage is associated with an expected 0.11-year increase in life expectancy, holding other predictors fixed.
- A one‑unit increase in birth rate corresponds to an expected 0.59‑year decrease in life expectancy, controlling for all other predictors.
Exercise 1: Calculating the expected value of a multiple regression model
4.3 Fitting MLR Models in R and the tidy() function
Fitting a multiple linear regression uses the exact same conceptual process from ?sec-leastsquares: Residuals are still defined \(e = Y- \hat{Y}\), and minimising the sum of squared residuals provides optimal and unique ‘least squares estimates’ for the model parameters \(b_0, \ldots, b_k\).
MLR uses the same lm() function as SLR (these are both ‘linear models’, after all). Now, the model formula (the first argument, identified by the ~ separating the LHS and RHS) includes both predictors on the RHS, separated by a +:
Note: R formulas
and we can extract our model coefficients (\(\beta_0, \ldots,\beta_3\)) in the same way:
mod$coefficients(Intercept) x1 x2
1.8184485 0.5587775 1.4934882
coef(mod)(Intercept) x1 x2
1.8184485 0.5587775 1.4934882
We can also use the tidy function from the broom package for a nice summary of the relevant inferential statistics for each parameter:
# A tibble: 3 × 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 1.82 0.388 4.68 5.21e- 6 1.05 2.58
2 x1 0.559 0.0517 10.8 1.10e-21 0.457 0.661
3 x2 1.49 0.0483 30.9 1.48e-77 1.40 1.59
The hypotheses being tested for each coefficient is analagous to that in the SLR case ?sec-beta_inference: \[ H_0 : \beta_i = 0 \qquad\text{vs}\qquad H_a : \beta_i \ne 0. \] And is tested the same way - using the t-statistic: \[ t_i = \frac{\hat\beta_i}{\operatorname{SE}(\hat\beta_i)}. \]
However, the interpretation of these hypotheses is slightly more nuanced. As pointed out above, the value of _i here does not represent the relationship between \(Y\) and \(X_i\) without qualification - but only when the other variables are held constant. Analogously, the hypothesis being tested by \(t_i\) is whether \(X_i\) is related to to \(Y\) once the other predictors are accounted for. ::: Key-point ### interpreting partial regression coefficients, \(\beta_i\)